Online PHP Function(s){ #Test PHP functions online!; }

  •  
 

 
Added: 08/05/2012 by John

How to - Copy millions of tiny files through a network

I had to copy millions and millions of tiny small image files, about 5KB each, total size more than a terabyte, over a network fast.

SCP, NFS, FTPFS and SSHFS where all too slow to work with. All these methods copy every single file piece by piece, causing lots of overhead, I never reached more than 1MB/s over a gigabit line.

The fastest solution I came up with after lots and lots of testing was this:

Receiving server:
ttcp -r -p 10112 | cpio -i -d -m

Sending server:
find /home/files -print | cpio -o | ttcp -t receivinghost.com -p 10112

You need TTCP and cpio for this task, yum/apt-get cpio ttcp works fine (redhat systems needs RPMforce). You can also install this tools from source.

What does it do?

ttcp opens a tunnel on the receiving client, with on the end a cpio instance.
The sending server sends your files with cpio, transfers it to the receiving host and 'extracts' it there.

Why CPIO? Why TTCP?

CPIO is in my experience slightly faster than TAR, and does not have any problems with long filenames. When I used TAR the filenames became a problem.
TTCP has almost no overhead. It is not secure like SSH, but SSH has lots of overhead and consumes much CPU power (witch can become a bottleneck). TTCP was way faster for this task, and it gives you a little extra: the speed results afterwards (TTCP is officially used for bandwidth testing).

Limits and other problems

The biggest bottleneck is the read/access speed on the harddisk. All those small files (and in my case with very long names) take a long time to read. An SSD harddisk will do much better. I also found the XFS filesystem performs much better than EXT3/4 reading all those small files. With EXT4 I got around 4MB/s, while with XFS I got over 12MB/s.
Another problem I ran into were INODES. All those millions of files eats INODES very fast. After a few hours I could not write to the harddisk anymore. It had over 1.5 TB of free space. It seemed the INODES where all used. So be sure to proper partition your harddisk with lots of space for INODES.

Other solution 1 - A little compression

This solution saves a little bandwidth by gzipping the files. This will only save you bandwidth if you transfer text/html/sql files, images, mp3 and other already compressed formats will only cost more time and overhead.
While copying small files this solutions will cause the CPU to be the bottleneck.

Receiving server:
ttcp -r -p 10112 | gunzip | cpio -i -d -m

Sending server:
find /home/files -print | cpio -o | gzip | ttcp -t receivinghost.com -p 10112

Other solution 2 - Secure

This solution is secure, it connects to the other server with SSH instead of TTCP. Cons: SSH used lots of CPU power and has lots of overhead

Receiving server:
ssh user@host "find /home/files/ -depth -print | cpio -oaV" | cpio -imVd

Other solution 3 - TAR over TTCP

If you prefer TAR over CPIO, this is how you can use TAR over TTCP. Cons: If you are dealing with long filenames, TAR can refuse to work
You can use 'tar xfz- and 'tar -cfz' to add compression. Be sure not to add '-v'. While verbose is great to see how far you are, it is way, way slower. How come? Verbose outputs all filenames to your screen, but all those filenames needs to be transmitted to your screen. TAR waits till you received the filename on your screen thus making it slow.

Receiving server:
ttcp -r -p 10208 | tar xf -

Sending server:
tar -cf - /home/files -| ttcp -t receivinghost.com -p 10208

Other solution 4 - TAR over SSH

Need it to be secure with tar? Use tar over SSH. Cons: If you are dealing with long filenames, TAR can refuse to work, and SSH has lots over overhead.
You can use 'tar xfz- and 'tar -cfz' to add compression. Be sure not to add '-v'. While verbose is great to see how far you are, it is way, way slower. How come? Verbose outputs all filenames to your screen, but all those filenames needs to be transmitted to your screen. TAR waits till you received the filename on your screen thus making it slow.

Sending server (pushing seems slightly faster than pulling):
tar -cf - /home/files | ssh receivinghost.com -p 22 'tar -xf - -C /home/files/'

Other solution 5 - DD

The fastest way to do this is by using DD. DD reads the disk on blocklevel, not on filesystem level. For me DD was not an option:
1. I needed to switch to the XFS filesystem (for read speed and to avoid inode problems, DD just 'clones' your harddisk). I did not have the space to save the whole image (3TB disk), mount it, and copy the file that way (and I doubt that would be any faster).
2. The files were on a live running production server. In order to use DD, the filesystem needs to be unmounted, witch need to be avoided on live production servers.



 

 

 


Comments

 
 
      © 2019 OnlinePHPFunctions.com | Disclaimer |       PHP versions: 7.2.4, 7.1.0, 7.0.14, 7.0.5, 7.0.4, 7.0.3, 7.0.2, 7.0.1, 5.6.29, 5.6.20, 5.6.19, 5.6.18, 5.6.17, 5.6.2, 5.5.34, 5.5.33, 5.5.32, 5.5.31, 5.5.18, 5.5.5, 5.5.0.a6, 5.5.0.a.5, 5.5.0.a.2, 5.4.34, 5.4.21, 5.4.13, 5.4.12, 5.4.11, 5.4.10, 5.4.9, 5.4.8, 5.4.7, 5.4.6, 5.4.5, 5.4.4, 5.4.3, 5.4.2, 5.4.1, 5.4.0, 5.3.29, 5.3.27, 5.3.23, 5.3.22, 5.3.21, 5.3.20, 5.3.19, 5.3.18, 5.3.17, 5.3.16, 5.3.15, 5.3.14, 5.3.13, 5.3.12, 5.3.11, 5.3.10, 5.3.2, 5.3.1, 5.3.0, 5.2.17, 5.2.16, 5.1.6, 5.1.5, 5.0.5, 5.0.4, 4.4.9